Search CORE

216 research outputs found

Using R and Bioconductor for proteomics data analysis.

Author: Christoforou Andy
Gatto Laurent
Publication venue: Biochimica et Biophysica Acta - Proteins and Proteomics
Publication date: 01/01/2013
Field of study

This review presents how R, the popular statistical environment and programming language, can be used in the frame of proteomics data analysis. A short introduction to R is given, with special emphasis on some of the features that make R and its add-on packages premium software for sound and reproducible data analysis. The reader is also advised on how to find relevant R software for proteomics. Several use cases are then presented, illustrating data input/output, quality control, quantitative proteomics and data analysis. Detailed code and additional links to extensive documentation are available in the freely available companion package RforProteomics. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan

arXiv.org e-Print Archive

CiteSeerX

Apollo (Cambridge)

Revisiting the thorny issue of missing values in single-cell proteomics

Author: Gatto Laurent
Vanderaa Christophe
Publication venue
Publication date: 13/04/2023
Field of study

Missing values are a notable challenge when analysing mass spectrometry-based proteomics data. While the field is still actively debating on the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modelling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values and for proper encoding of missing values.Comment: The code to reproduce the images presented in the manuscript is available in the Github repository: https://github.com/UCLouvain-CBIO/2023_scp_n

arXiv.org e-Print Archive

Towards reproducible MSMS data preprocessing, quality control and quantification

Author: Kathryn S. Lilley
Laurent Gatto
Publication venue
Publication date: 14/10/2010
Field of study

The development of MSnbase aims at providing researchers dealing with labelled quantitative proteomics data with a transparent, portable, extensible and open-source collaborative framework to easily manipulate and analyse MS2-level raw tandem mass spectrometry data. The implementation in R gives users and developers a great variety of powerful tools to be used in a controlled and reproducible way. Furthermore, MSnbase has been developed following an object-oriented programming paradigm: all information that is manipulated by the user is encapsulated in ad hoc data containers to hide it's underlying complexity. We illustrate the usage and achievements of our software using a published spiked-in data set in which varying quantities of test proteins have been labelled with four different iTRAQ tags. In addition to providing raw MSMS data, MSnbase also stores meta-data and logs processing steps in the data object itself for optimal traceability. We provide graphics on how to inspect precursor data for quality control and how individual or merged MSMS spectra can subsequently be processed, plotted and extracted using a variety of methods. We also demonstrate how reporter ions (or any peaks of interest defined by the user) can easily be quantified and normalised using several build-in alternative strategies and how the effect of each transformation can be recorded, examined and reproduced. MSnbase constitutes a unique versatile working and development environment to process labelled MSMS data and provides in turn important feedback for data acquisition optimisation. We conclude by presenting future extensions of MSnbase and highlight its usage in reproducible proteomics research

Crossref

ZENODO

Nature Precedings

Recommended from our members

Open Research Project, first thoughts

Author: Gatto Laurent
Publication venue: https://unlockingresearch-blog.lib.cam.ac.uk/?p=1363%20Written%20by
Publication date: 08/03/2017
Field of study

Dr Laurent Gatto is one of the participants in the Office of Scholarly Communication’s Open Research Pilot. He has recently blogged about his first impressions of the pilot

Apollo (Cambridge)

Assessing the Applicability of the GTR Nucleotide Substitution Model Through Simulations

Author: Catanzaro Daniele
Gatto Laurent
Milinkovitch Michel C.
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

The General Time Reversible (GTR) model of nucleotide substitution is at the core of many distance-based and character-based phylogeny inference methods. The procedure described by Waddell and Steel (1997), for estimating distances and instantaneous substitution rate matrices, R, under the GTR model, is known to be inapplicable under some conditions, ie, it leads to the inapplicability of the GTR model. Here, we simulate the evolution of DNA sequences along 12 trees characterized by different combinations of tree length, (non-)homogeneity of the substitution rate matrix R, and sequence length. We then evaluate both the frequency of the GTR model inapplicability for estimating distances and the accuracy of inferred alignments. Our results indicate that, inapplicability of the Waddel and Steel’s procedure can be considered a real practical issue, and illustrate that the probability of this inapplicability is a function of substitution rates and sequence length

Directory of Open Access Journals

PubMed Central

DI-fusion

Mapping the sub-cellular proteome

Author: Gatto Laurent
Publication venue
Publication date
Field of study

In biology, localisation is function. Cells display a complex sub-cellular structure with numerous distinct niches responsible for specific biological processes. Consequently, not only must proteins be present in a cell to accomplish their biological functions, but they must be localised in their intended sub-cellular locations. In contrast, mis-localised proteins can have serious advert consequences. In this talk, I will present how contemporary experimental and computational technologies can be used to produce proteome-wide proteins localisation maps

ZENODO

FigShare

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

Author: Bruley Christophe
Burger Thomas
Ferro Myriam
Gatto Laurent
Lazar Cosmin
Publication venue: J Proteome Res
Publication date: 23/02/2016
Field of study

Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.his work was supported by the following funding: ANR-2010-GENOM-BTV-002-01 (Chloro-Types), ANR-10-INBS-08 (ProFI project, “Infrastructures Nationales en Biologie et Santé”, “Investissements d’Avenir”), EU FP7 program (Prime-XS project, Contract no. 262067), the Prospectom project (Mastodons 2012 CNRS challenge), and the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1).This is the final version of the article. It first appeared from the American Chemical Society via https://dx.doi.org/10.1021/acs.jproteome.5b0098

Crossref

Hal - Université Grenoble Alpes

ZENODO

HAL-Inserm

HAL-Université de Bretagne Occidentale

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Apollo (Cambridge)

HAL-CEA

Recommended from our members

A Bioconductor workflow for processing and analysing spatial proteomics data

Author: Breckels Lisa M
Gatto Laurent
Lilley Kathryn S
Mulvey Claire M
Publication venue: F1000Research
Publication date: 28/12/2016
Field of study

Spatial proteomics is the systematic study of protein sub-cellular localisation. In this workflow, we describe the analysis of a typical quantitative mass spectrometry-based spatial proteomics experiment using the MSnbase and pRoloc Bioconductor package suite. To walk the user through the computational pipeline, we use a recently published experiment predicting protein sub-cellular localisation in pluripotent embryonic mouse stem cells. We describe the software infrastructure at hand, importing and processing data, quality control, sub-cellular marker definition, visualisation and interactive exploration. We then demonstrate the application and interpretation of statistical learning methods, including novelty detection using semi-supervised learning, classification, clustering and transfer learning and conclude the pipeline with data export. The workflow is aimed at beginners who are familiar with proteomics in general and spatial proteomics in particular.LMB and CMM are supported by a Wellcome Trust Technology Development Grant (grant number 108441/Z/15/Z). KSL is a Wellcome Trust Joint Investigator (110170/Z/15/Z). LG is supported by the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1)

Apollo (Cambridge)

Effects of Traveling Wave Ion Mobility Separation on Data Independent Acquisition in Proteomics Studies

Author: Bond Nicholas J.
Gatto Laurent
Lilley Kathryn S.
Shliaha Pavel V.
Publication venue
Publication date: 07/06/2013
Field of study

qTOF mass spectrometry and traveling wave ion mobility separation (TWIMS) hybrid instruments (q- TWIMS-TOF) have recently become commercially available. Ion mobility separation allows an additional dimension of precursor separation inside the instrument, without incurring an increase in instrument time. We comprehensively investigated the effects of TWIMS on data-independent acquisition on a Synapt G2 instrument. We observed that if fragmentation is performed post TWIMS, more accurate assignment of fragment ions to precursors is possible in data independent acquisition. This allows up to 60% higher proteome coverage and higher confidence of protein and peptide identifications. Moreover, the majority of peptides and proteins identified upon application of TWIMS span the lower intensity range of the proteome. It has also been demonstrated in several studies that employing IMS results in higher peak capacity of separation and consequently more accurate and precise quantitation of lower intensity precursor ions. We observe that employing TWIMS results in an attenuation of the detected ion current. We postulate that this effect is binary; sensitivity is reduced due to ion scattering during transfer into a high pressure “IMS zone”, sensitivity is reduced due to the saturation of detector digitizer as a result of the IMS concentration effect. This latter effect limits the useful linear range of quantitation, compromising quantitation accuracy of high intensity peptides. We demonstrate that the signal loss from detector saturation and transmission loss can be deconvoluted by investigation of the peptide isotopic envelope. We discuss the origin and extent of signal loss and suggest methods to minimize these effects on q-TWIMS-TOF instrument in the light of different experimental designs and other IMS/MS platforms described previously

ZENODO